Uncertainty Quantification in the Classification of High Dimensional Data

نویسندگان

  • Andrea L. Bertozzi
  • Xiyang Luo
  • Andrew M. Stuart
  • Konstantinos C. Zygalakis
چکیده

Classification of high dimensional data finds wide-ranging applications. In many of these applications equipping the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior distribution on the classification labels, these methods automatically give measures of uncertainty. The methods are all based around the graph formulation of semi-supervised learning. We provide a unified framework which brings together a variety of methods which have been introduced in different communities within the mathematical sciences. We study probit classification [50] in the graph-based setting, generalize the level-set method for Bayesian inverse problems [27] to the classification setting, and generalize the Ginzburg-Landau optimization-based classifier [7, 46] to a Bayesian setting; we also show that the probit and level set approaches are natural relaxations of the harmonic function approach introduced in [56]. We introduce efficient numerical methods, suited to large data-sets, for both MCMC-based sampling as well as gradient-based MAP estimation. Through numerical experiments we study classification accuracy and uncertainty quantification for our models; these experiments showcase a suite of datasets commonly used to evaluate graph-based semi-supervised learning algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quantification of identical and unique segments in ethylene-propylene copolymers using two dimensional liquid chromatography with infra-red detection

Hyphenating High Temperature High Performance Liquid Chromatography (HT-HPLC) with High Temperature Size Exclusion Chromatography (HT-SEC) (High Temperature Two Dimensional Liquid Chromatography (HT-HPLC x HT-SEC or HT 2D-LC)) leads to an isocratic elution in the second dimension, which in turn enables to use IR detector (quantitative detection) for monitoring the eluting polymers. Experimental...

متن کامل

Classification of Chronic Kidney Disease Patients via k-important Neighbors in High Dimensional Metabolomics Dataset

Background: Chronic kidney disease (CKD), characterized by progressive loss of renal function, is becoming a growing problem in the general population. New analytical technologies such as “omics”-based approaches, including metabolomics, provide a useful platform for biomarker discovery and improvement of CKD management. In metabolomics studies, not only prediction accuracy is ...

متن کامل

 The Quantification of Uncertainties in Production Prediction Using Integrated Statistical and Neural Network Approaches: An Iranian Gas Field Case Study

Uncertainty in production prediction has been subject to numerous investigations. Geological and reservoir engineering data comprise a huge number of data entries to the simulation models. Thus, uncertainty of these data can largely affect the reliability of the simulation model. Due to these reasons, it is worthy to present the desired quantity with a probability distribution instead of a sing...

متن کامل

Forward and Backward Uncertainty Quantification in Optimization

This contribution gathers some of the ingredients presented during the Iranian Operational Research community gathering in Babolsar in 2019.It is a collection of several previous publications on how to set up an uncertainty quantification (UQ) cascade with ingredients of growing computational complexity for both forward and reverse uncertainty propagation.

متن کامل

3D Scene and Object Classification Based on Information Complexity of Depth Data

In this paper the problem of 3D scene and object classification from depth data is addressed. In contrast to high-dimensional feature-based representation, the depth data is described in a low dimensional space. In order to remedy the curse of dimensionality problem, the depth data is described by a sparse model over a learned dictionary. Exploiting the algorithmic information theory, a new def...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1703.08816  شماره 

صفحات  -

تاریخ انتشار 2017